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ABSTRACT 

A general Computer program is described that will 
compute asymptotic standard errors and carry out significance tests 
for an endless variety of (standard and) nonstandard large-sample 
statistical problems, without requiring the statistician to derive 
asymptotic standard error formulas. The program assumes that the 
observations have a multinormal distribution and that the null 
hypothesis to be tested has the form xi = 0 where xi is some function 
(to be specified by the user) of means, variances, and covariances. 
Only minor programming is required to replace either or both of these 
assumptions. The package performs the automated hypothesis testing 
and consists of a main program and six subroutines. The package is 
written in Fortran IV. . (Author/ON) 
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Automated Hypothesis Tests and Standard Errors for Nonstandard Problems 

Frederic M. Lord 

Abstract 

A general computer program is described that will compute asymptotic 
standard errors and carry out significance tests for an endless variety 
of (standard and) nonstandard large -sample statistical problems, without 
requiring the statistician to derive asymptotic standard error formulas* 
The program assumes that the observations have a multinormal distribution 
and that the null hypothesis to be tested has the form £ = 0 where £ 
is some function (to be specified by the user) of means, variances, and 
covariances* Only minor reprogramming is required to replace either or 
both of these assumptions* 



Automated Hypothesis Tests and Standard Errors for Nonstandard Problems* 

Frederic M. Lord 

Introduction 

A general computer program can be written that will compute asymptotic 
standard errors and carry out significance tests for an endless variety 
of (standard and) nonstandard large-sample statistical problems, wi't/iout 

.•4 

requiring the statistician to derive asymptotic standard . ‘■ ror formulas. 
This report describes one such computer program (currently available from 
the writer). The program, written by M* Stocking, is described starting 
on page 10. 

As presently implemented, the program assumes that the observations 
have a multinormal distribution and that the null hypothesis to be tested 
has the form | = 0 where | is some function (to be specified by the user) 
of means, variances, and covariances. As will be pointed out, only mi nor 
reprogramming is required to replace either or both of these assumptions. 

The present program is not set up to deal with vector hypotheses of 
the form | = 0 . Possible program changes to accommodate vector hypotheses 
will be obvious to the statistician (see Moran, 1970, section 3). 

Without user action, the program accommodates two samples, each 
composed of any number of observations on a maximum of 10 random variables. 
More samples (up to 20 ) with fewer random variables can be accommodated if 
the user sets all population covariances between variables from different 
samples equal to zero. In addition, the maximum of 10 random variables 
per sample can be increased, if desired. 

*Research reported in this paper has been supported by grant GB-3278IX 
from National Science Foundation. 



3 



- 2 - 



Shenton, Bowman, and Sheehan (1971) describe computer procedures much 
more powerful and more versatile than those described here, with the 
exception that their program deals with univariate problems only. 

Asymptotic Hypothesis Testing 



Consider a sample consisting of a matrix X of observations drawn 
from the distribution f(x||,0) , where 0 is a vector of "nuisance" 

/V ^ *** 

parameters mathematically independent of | and of each other. We wish 

A A 

to test the composite hypothesis H n : | = 0 . Let | and 9 be the 

V #v 

maximum likelihood estimates of £ and 0 obtained without the restric- 

tion |=0. Hq can usually be tested by computing £/a£ where 

is the asymptotic sampling variance of | with | and 0 substituted for 

the unknown parameters | and 9 . The rejection region for R- con- 

~ U 

sists of one or both tails of the asymptotic distribution of $/<J£ under 
Hq • Under regularity conditions (Cramer, 19^6, section 33*3), this 
distribution is normal with zero mean and unit variance. 

In a well-defined sense (Wald, 1943; Moran, 1970), the test just 
described is asymptotically optimal (locally asymptotically most powerful). 

A A 

If i and 9 are consistent estimators but not maximum likelihood esti- 
mators, the test will have the seme significance level, but the power of 
the test (locally) may be low. A regularity condition worth noting is 
that 1=0 must not be a boundary of the range of § . 

Input 



t 

t 



! 




In the present implementation two alternative methods are provided 
for data input. If the sample means m s fm. } and the sample variance- 
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covariance matrix S s ||s. .|| are available, these may be used as input. 
Otherwise, the raw observations are input and m and S automatically 
computed. 

In order to use the packaged computer program, the statistician 

must specify what function | * |(t) or | = |(p,Z) is to be tested, where 

| is differentiable and t is a vector whose elements t s r(p,Z) are 

g g ~ ~ 

differentiable functions of the mean vector p s jp . \ and the variance- 

covariance matrix Z 5 ||a. .|| of the multinormal population, (in some 

- 

problems — for example, in testing hypotheses about correlation coefficients- 

it is often convenient for the statistician to work with r , rather than 

with p and Z only.) Note that | = |(t) or | = g(p,Z) and that 

t = t (p,Z) where each "hat" denotes a maximum likelihood estimate (for 
6 6 ~ ~ 

a rigorous treatment, see Zehna, 1966). In the present implementation, 
the statistician specifies the function | simply by writing a FORTRAN 
arithmetic assignment statement . On the left of the equal sign he writes 
XIHAT and on the right he writes down the explicit arithmetic expression 
for |(t) or for |(p,Z) . He inserts this FORTRAN statement at a pre- 
arranged point in a subroutine. 

If p = m and Z = S , as in many simple problems, the user simply 

provides the explicit arithmetic expression for |(p,£) and the computer 

proceeds directly to compute f , 8* , %/8* , and finally the percentile 

at which g/8* falls in a standard normal distribution.- If the user 

defines | in terms of the functions t , then he must supply FORTRAN 

§ 

arithmetic assignment statements defining each r as a function of 

s 

other t and the sufficient statistics m and S . 

R ~ ~ 



If there are some restrictions on the parameters (other than that 
|=0 under H_ ), then ji / m or Z ^ S . Since p and Z are esti- 
mated without the restriction | = 0 , p and Z will usually have a 
simpler mathematical form than would estimates obtained with this restric- 
tion. Whenever p. 4 m. or a. . 4 s. . . the statistician must insert in 
the subroutine arithmetic assignment statements defining p. or 9. . as 

1 lj 

functions of m and S , or if convenient as functions of p's and d 's 
If formulas for efficient estimators of p and Z are not known to 

*s# 

the statistician, he may substitute other consistent estimators, in which 
case (under regularity conditions) the significance test may be a con- 
servative one in the sense that it rejects the null hypothesis less often 
than it should . 

Automated Procedure 

The program input described in the preceding section enables the 
computer to compute | for any values of m and S . The main virtue of 
the program is that the statistician does not have to derive an ejqplicit 

A 

formula for the asymptotic sampling variance of | . This is avoided 
because the computer uses the general formula 



of = £ £ "1 I*- cov(m , m ) 

5 1 J 



+ E Z Z Z t " -' cov(s 
... os , OS . . 

g h i a gh ia 



gh ,s ij^ 



(i) 



where cov denotes a covariance with sample estimators substituted for 
population values. The required derivatives are approximated numerically 
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by the computer (see Description of Computer Package, p. 11), using only 
the formula for | , without need for symbolic differentiation (the user 
should check that | is differentiable, however). The necessary sampling 
covariances are automatically computed from standard formulas built into 
the computer program: 



cov(m. , m . ) = o /w 

i 0 ij' * 



“"W - V* ♦ y hi >/» . 



( 2 ) 



?■ 

ft 

'.V. 



where N is the number of observations, and either a = s. . or else 
an arithmetic assignment statement defining B has been provided in 

1J 

the program by the statistician. 

If some parameters and statistics other than means, variances, and 
covariances are to be used and some distribution other than multinormal 
is to be assumed, it is only necessary to provide appropriate formulas 
for computing the maximum likelihood estimates and to insert appropriate 
formulas for sampling covariances in place of (2). 



Illustrative Prob lems 



O 
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The computer program has been checked out by applying it to numerical 
examples testing some two dozen different null hypotheses for which the 
numerical answers could be verified. A partial listing may suggest to 
the reader the scope of the program. Primes are used to distinguish param- 
eters of two different populations; a. denotes the standard deviation of 
variable i . 



i- 




: iv • 

. 

b: 

■t. 

I 





In a Monte Carlo study, 1000 values of 






and their probability 



levels were computed using the program described, where 



- P 15 ) ^34 * 

The time required on a 360/65 for all 1000 was about 80 seconds. 
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Description of Computer Package 
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1. General Description 

The package that performs the automated hypothesis testing consists 
of a main program and six subroutines. The package is written in FORTRAN XV 
in double precision and when compiled on an IBM 360/^5 using the G compiler, 
takes 56 K. This size will be increased if the user either enlarges the 
number of variables in a sample (see section 5) or modifies subroutine 
XIHAT substantially (see section 4). 

LASAHT Main program for large Sample Hypothesis Testing. LASAHT 



COVM 



COVS 



MATOUT 



IN 



reads input, computes derivatives, and calls suoroutines 
COVM, COVS, IN, MATOUT, and XIHAT. 

Subroutine called by LASAHT if input is raw data to read 
data and compute means and covariances. 

Subroutine called by LASAHT to compute first sum in 
equation (l). 

Subroutine called by LASAHT to compute second sum in 
equation (l). 

Subroutine called by LASAHT to print upper triangular 



matrices. 



FILL 



Subroutine called by MATOUT. 



Subroutine in which user will supply formulas for | and 



any restrictions (see section 4). 



XIHAT 
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2. Computation of Partial Derivatives 

The partial derivatives are computed numerically* All elements in t 

are held constant except one. This element is increased by some increment, 

and a value for | is computed; the element is then decreased from its 

original value by the same increment, and another value of I is computed. 

The derivative is approximated as the difference between these two values 
/\ 

of i divided by twice the increment. 

In computations with fixed word length, the optimum size of the 
increment is not known. A repetitive procedure is used in which the size 
of the increment is decreased up to the point where the difference between 
two successively computed approximations to the derivative increases (due 
to truncation error). The first value used for the increment is 0.001. 

In successive repetitions, this is decreased by a factor of 10, up to 
a maximum of five times* If the differences between successive approxima- 
tions do not increase, the last derivative computed is used* 



3* Input, Output. Program Messages for User 



ERIC 



Input 

The description of the input for one or two samples with a maximum of 
10 variables each is simple, and will be discussed first. The description 
for more samples with fewer variables is more complex and will be discussed 
later. The following is a description of the input cards required. The user 
must follow the normal FORTRAN convention of right -ad justing all values within 
the specified fields* 
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CARD 1 FORMAT (1615) 



cols. 1-5 NSAMP 



The number of independent samples of data. 
Must be one or two. 



CARD 2 FORMAT (10A8) 



cols. 1-80 


TITLE 


Title of first sample 


CARD 3 FORMAT f 1615) 




cols. 1-5 


N0BS1 


Number of observations in first sample 


cols. 6-10 


N1 


Number of variables in first sample, maximum 
of 10 


cols. 11-15 


INPUT 


= 0 , input for first sample is raw data 
= 1 , input for first sample are sample means 



and sample variance -covariance matrix. 



CARD 4 FORMAT (10A8) 
cols . 1-80 JMT 



Variable format for reading data for first 
sample. Will be used for either raw data or means 
and variance-covariance matrix, depending upon 
cols. 11-15 of card 3. Must specify a floating- 
point format. 



If INPUT = 0 

CARD 5+ FORMAT (IMT) 

cols. 1-80 RAW(l), (l=l, ...,Nl) Observations on N1 variables, one record 

per set of N1 observations, with each set 
of observations beginning a new record. 
Punched according to format specified 
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If INPUT = 1 

CARD 3+ FORMAT (FMT) 

cols. 1-80 X(l), (1=1, ...,ni) 

CARD 6 + PORMAT(FMT) 

cols. 1-80 S(l,j),(l=l,...,Nl; 

J=I,...,Nl) 



N1 sample means for first set of data, punched 
according to format specified on card 4. 

Upper triangLe, including the diagonal, 
of the sample variance-covariance matrix. 
This triangular matrix must be punched 
row by row (each row begins with the 
diagonal element) with each row 
beginning a new record according to 
format specified on card 4. 

o 

If cols. 1-5 of card 1=2, i.e., there are two samples, then cards 2 
through 6+ are repeated for the second sample. The variable names which 
change, and their meanings are: 

N0BS2 - number of observations in the second sample 

N2 - number of variables in the second sample, maximum of 10 

X2 - means for second sample 

S2 - variance-covariance matrix for second sample. 

All other variables are reused for the second sample. 

It is possible to increase the number of samples up to a maximum 
of 20, but one must also reduce the number of variables per sample, and 
specify that the covariances between variables from different samples are 
zero. The input cards remain essentially the same. In particular, card 1, 
NSAMP, must still be either one or two. 



O 

ERIC 



15 



-14- 



Suppose we have two samples, each with one variable, i.e., two uni- 
variate samples. If N0BS1 = N0BS2 , exactly the same results can be ob- 
tained from the program by considering the data as one bivariate sample 
with s^g = 0 . The input cards for two univariate samples with sample 
means and covariances as input would be similar to 



Card 


1 


cols. 1-5 


NSAMP = 2 


Card 


2 


cols. 1-80 


TITLE 


Card 


5 


cols. 1-5 


N0BS1 






cols. 6-10 


N1 = 1 






cols . 11-15 


INPUT = 1 


Card 


4 


cols. 1-80 


IMT = (8F10.4) (say) 


Card 


5 


cols. 1-10 


x(i) 


Card 


6 


cols. 1-10 


S(l,l) 


Card 


2* 


cols. 1-5 


TITLE for 2nd sample 


Card 


3* 


cols. 1-5 


N0BS2 






cols. 6-10 


N2 = 1 






cols. 11-15 


INPUT = 1 


Card 


4* 


cols. 1-80 


m T = (8F10.4) 


Card 


5* 


cols. 1-80 


X2(l) 


Card 


6* 


cols. 1-10 


S2(l,l) 



■^Indicates second sample. 



Considering these data as 


one bivariate 


Card 1 


cols. 1-5 


NSAMP = 1 


Card 2 


cols. 1-80 


TITLE 


Card 3 


cols. 1-5 


N0BS1 




cols. 6-10 


CVJ 

II 

H 



* p 



-15- 



Card 4 
Card 5 

Card 6(a) 

Card 6(b) 

Further examples might be of some use here. Suppose the user has 
four samples of the same size, each with two variables. This can be 
handled by the program by considering these data, to be one sample 
(NSAMP = l) with eight variables and the following sample variance- 
covariance matrix 



cols. 


1-80 


PMT = (8F10 


cols . 


1-10 


X(l) 


cols. 


11-20 


X(2) 


cols. 


1-10 


8(1*1) 


cols. 


11-20 


8(1,2) = 0 


cols. 


1-10 


8(2,2) 
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S 11 S 12 



s 22 



0 

0 



0 

0 



®53 s 34 



44 



0 

0 

0 

0 



0 

0 

0 

0 



55 56 

S 66 



0 

0 

0 

0 

»0 

0 

s 77 



0 

0 

0 

0 

0 

0 

S 78 

88 



Or we can consider these data as two samples (NSAMP =2) with 4 var iab les 
each and obtain the same result. The sample variance-covariance matrix 
for the first sample would be 



11 



12 

S 22 



0 

0 



0 

0 



S 33 s 34 

s 44 



17 



ft VV-jJ 1 



V' t\iM iA-.** lVrv'1 . 



7 



1 






and for the second sample 
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ll 5 12 



s 22 



0 

0 

3 * 

33 



0 

0 

s 34 

*lk 



* = second sample 

By extension of these examples, it is clear that without changing the 
program, a maximum of 20 univariate samples can be handled if they are 
all of the same size. In this case, we would treat this as two (NSAMP = 2) 
samples, each with 10 variables. The sample variance-covariance matrix 
would be similar for each sample — the main diagonal would contain the vari- 
ances, and all off-diagonal elements would be zero. 

Output 

The output from the program is largely self-explanatory. A title 
is printed at the top of the first page. The number of samples (NSAMP) 
is also printed, and will be either one or two. It will not reflect the 
actual number of samples if the user has taken advantage of the facilities 
described in the input section to expand the number of samples. Also, 
the time and date of the run are printed (see section 6). 

For each of two possible samples there is printed the title, number 
of observations, number of variables, and the variable INPUT. In addition, 
to provide the user with means of checking his input, the sample means, 

and upper right triangle of the sample variance-covariance matrix is 
printed. 
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The estimated means and variance-covariance matrix are then printed. 
These values will be identical to the sample values except for those 
values for which the user has provided formulas for computing different 
estimates in subroutine XIHAT (see section 4). This output is also 
provided for user verification of his formulas. 

A 

The value of g , computed with the estimated values, is then printed. 
In addition, the partial derivatives of g with respect to the means and 
variance -covariance matrices are printed for each sample. 

Finally, g is printed again, along with var(g) , = ■'/var(g) , 

g/ a | > an< * the percentile at which g/<j£ falls in a standard normal 
distribution. 

Messages 

The program detects two conditions that will result in the printing 
of a message: 

1. If NSAMP is greater than two, the actual value of NSAMP that 
the user supplied is printed and the program exits . 

2. The program uses the value of the last derivative computed as 
the partial derivative, and continues on to the computation of the next 
partial derivative. If a derivative is obtained from computation with the 
smallest increment, a message is printed. 

k . Subroutine XIHAT 

Subroutine XIHAT is the subroutine in which the user inserts coding for 
his specific null hypothesis g = 0 . If the user wishes to specify 
estimates other than sample means and variance-covariances, he also inserts 
the coding in this subroutine. 
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Variables containing information transmitted to this subroutine as 
arguments are as follows (these names differ slightly from the corresponding 
variable names in LASAHT): 



Variable name 



X 

S 

X2 

S2 

N 

N2 

NOBS 

NOBS2 

NSAMP 



10 

10 by 10 
10 

10 by 10 



Dimension Contents 

Sample means for first sample 
Sample variance-covariance matrix for first 
sample 

Sample means for second sample 
Sample variance-covariance matrix for second 
sample 

Number of variables in first sample 
Number of variables in second sanple 
Number of observations for first sanple 
Number, of observations for second sample 
Either 1 or 2 , same variable as card 1 
of input 

Variables containing information transmitted from this subroutine as 
arguments are as follows: 

Estimated means for first sample 
Estimated variance-covariance matrix for 
first sanple 

Estimated means for second sample 
Estimated variance -covariance matrix for 
second sample 

The value of | computed in this subroutine 



M 

$ 

M2 

$2 

XIHAT 



10 

10 by 10 
10 

10 by 10 
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The FORTRAN statements down to 9999 transfer the sample means and 
covariances into the estimated means and covariances. This insures that 
the estimated means and covariances will be the same as the sample values 
unless modified by the user. 

If the user wishes to specify other estimates for some or all of 
the means and covariances, - he must insert the FORTRAN code to do that 
operation between the first two asterisk cards. He must follow the 
following rules: 

1. In referring to any element of any covariance matrix, the 
second subscript must be greater than or equal to the first, 
i.e., only the upper right triangle is to be worked with. 

2. The estimates supplied must appear to the left of an equal 
sign. 

3* The first line of FORTRAN code must not be a format statement, 
and must have the statement number 100. 

4 . Statement numbers 999 ® and 9999 must not be used. 

The user must insert the FORTRAN code for the computation of t 
between the second set of asterisk cards. This code may consist of 
a number of statements. The user may refer to any variable previously 
defined. The following rules must be followed: 

1. In referring to any element of any variance-covariance matrix, 
the second subscript must be greater than the first. 

2. Statement numbers 9998 and 9999 must not be used, and statement 
numbers used in specifying estimates (if any) must not be 
repeated . 




3 . If no estimates were specified, the first line of code must 
not be a format statement, and must have the statement number 
100 . 

4. The variable XIHAT must appear to the left of an equal sign. 
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Increasing the Number of Variables 

The following vectors in the main program LASAHT have dimensions which 
are dependent upon the number of variables in each sample. Let n^ indi- 
cate the maximum number of variables in the first sample, and n^ be the 



maximum number in the second sample. 

MSAVEO^), $SAVE(n 1 , n x ), M(n x ), $(n 1 ,n 1 ), Sfn^r^), X^), 

M2SAVE(ng), $2SAVE(n 2 , n g ), M2(ng), $2^^), S2(n 2 ,n 2 ), X2(n 2 ) 

n (n + 1) + l) 

DER(max (— — 5 , p )) 



2 2 

TAU(n^ + n 2 + + n 2 ) 

D0UT(max(n^, n 2 ), maxfn^rig)) 

INDEX (max (n^, ng ) ) 

The equivalencing of TAU to MSAVE, M2SAVE, $SAVE, and $2SAVE is arranged 

so that MSAVE(l) is TAU(l), M2SAVE(l) is TAU^ + l), $SAVE(l,l) is 

2 

TAU(n^ + n 2 + l) and $2SAVE(l, l) is TAU(n^ + n 2 + n^ + l). 

The user may change the maximum number of variables by using the above 
formulas for determining the correct dimension and equivalence statements. 
There are only three restrictions: (l) the maximum number of variables 

must be the same for both samples; (2) the variable IDEM must be set 



